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ABSTRACT 
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an individual candidate, cut score variation did result in 
significant changes in classification. When cut score changes as 
small as five points were made, reclassification of candidates did 
occur. Because of vulnerability to litigation, many companies have 
turned to rank ordering employees and selecting a specific top 
portion. However, employer vulnerability still exists with regard to 
methods used in setting cut scores and classifying candidates. Five 
tables and two figures illustrate various types of classification. 
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Personnel classification procedures have been challenged in court cases, particularly in 
the past ten years (Barrett, Alexander, Anesgartv Doverpike, 1986; Bloom and Killingswonh, 
1982; Paetzold, 1991). Many controversies have been initiated based on Federal Legislation 
(Equal Employment Opportunity Commission, Title VI I, of Equal Rights Pay, 1964 and more 
recently the American Disabilities Act, 1990). Traditionally, the consideration of an appropriate 
multiple regression equation has been the basis of many primie facie cases of discrimination 
(Barrett et ai, 1986; Bloom & Killingsworth, 1982). 

Successful court litigation has occurred based on a variety of factors that contribute to 
the acceptance or denial of a particular method of analysis. For inst?jice, selection of predictor 
variables, omission of predictor variables, use of proxy variables, dummy coding, 
multicollinearity, sample size, method of entry in multiple regression models, cut scores and 
inequality of subtests (Barrett et. al. 1986, Cohen & Cohen, 1983; Kriska & Milligan, 1982: 
Mays, 1976; Morris, 1981; Paetzold. 1992 and Rozeboom, 1989). 

In order to circumvent litigation many companies have resorted to a classification 
scheme utilizing cut scores and ranking systems. Often the individual job candidate is rank 
ordered among their competing cohorts. The actual rank position may then be highly dep)endent 
upon the number of applicants as well as final decision for cut scores on application tests. 
Therefore, the purpose of this paper is to explore the impact of three of these conditions, cut 
scores, sample size and overall classification scheme. 
Cut Scores 

When test cut scores are arbitrarily used for input in classification schemes several 
cautionary procedures should be followed. Standards have been prescribed for setting cut scores 
by the American Psychological Association (1990). The use of test scores for personnel 
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classification and selection is outlined by preestablished standards for lest publishers. First, 
cautionary statements concerning the status of misclassification should be reported for score 
levels at or near the cut off score. Furthermore, job analysis and job classification should be 
based on actual patterns of predictor scores. The method chosen for setting cut scores should 
also be presented in the test manual. 

If professional judges have been used to set cut scores, the qualifications of the 
professional judges should be included. These considerations are cited as 'primary' standards 
for educational and psychological testing. However, the obligation to produce reasonable 
evidence of fairness in test cut scores rests with the employer. 

Scioly (1992) further cites the need of consistency in decisions made. Very often 
validity and reliability of the test instruments are not linked to the decision scores. Expectancy 
tables for validity and reliability of test re-test coefficient, r, identity accuracy of classification, 
particularly for dichotomous variables. A set of measures of expected accuracy frequently used 
for selection ratios are known as the Taylor-Russell tables. Other aspects of measurements of 
accuracy in classification are sensitivity, hit rate, and kappa. The main intent is to increase true 
positives while minimizing false positives and false negatives (see Figure 1). Further 
investigation of the relationship between reliability and classification accuracy is needed. 

The regulations set forth within the American Disabilities Act (ADA) of 1990, further 
impacted the exposure to liability that employers have in personnel selection procedures. This 
includes setting cut scores and rank orders. For instance, if the setting of a cut score would 
penalize a protected group, liability could ensue based on ADA guidelines. 
Questions for Investigation 

Based on the above considerations this study will explore several areas: a) How does a 
classification scheme effect individual candidates based on applicant pool?, b) What effect does 
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sample size have on rank ordering candidates?, c) What effect does changing cut scores have on 
borderline candidates and lastly, d) If accommodations are needed for a disabled applicant does 
reclassification occur with the assistance of the accommodation? 

Method 

Two sample groups were used for this analysis with sample bases of 120 and 73. Data 
were classified and analyzed as a function of sample size, quartiles and cut scores. A review of 
the two data sets will follow. 
Data Set One 

The first sample group consisted of 120 simulated observations of employee scores 
based on actual selection procedures used for applicants for administrative assistant positions. 
The actual candidate selection process was based on observed scores from four measures used to 
test applicants. Cut scores for the English and Math test was 135, a behavioral role playing task 
recieved a cut score of 20 points, while a personal interview required 10 points minimum score. 
A typing test required a score of 45 words per minute in order for the candidate to be placed in 
the final selection pool, although no points were assigned for typing in the combined cut score 
used for fmal selection. The fmal combined cut off score for the selection was then set at 165. 
Candidates who obtained a score above 165 would be placed in the selection pool. 

Upon inspection of Tiible 1 , it should be noted that candidates falling within 
classification *2\ five points above cut off scores, represented approximately 10% of the sample 
pool across observations. Because the same candidates fell within the 2nd quartile for score 
ranking they may be rejected, yet they did meet basic criteria for selection via total test scores. 
Furthermore, candidates classified as *2' represented 5% of the total candidate pool across 
samples. Based on classification '1st' and '2nd* quartile ranking a total of 15% of a given 
candidate pool may be rejected unnecessarily. 
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Data Set Two 

A data base from a Central Florida Municipality for the position of Staff Assistant 1 
was analyzed. A total of 73 applicants were required to take a two part examination. Pan 1 
consisted of 44 questions concerning spelling and filing skills and a problem requiring the proper 
construction of a memo. Pan 1 1 of the exam consisted of an error detection test requiring 
applicaants to indicate knowledge of proper English to proofread a written memo. A total score 
of 70 represented minimum requirement for the combination of Part 1 and Part 1 1 personnel 
test. A typing test completed with a pass-fail grading required a minimum score of 45 in order 
for the candidate to be placed on the eligibility list. Additional points are be added to a 
candidate's total score provided they met minimum standards » five points are added for veterans 
and .5 points are added per year of seniority. 

Provisions are made for candidates requiring testing accommodations. Candidates 
meeting minimum requirements are then rank ordered within the candidate pool for that test 
wave. The top candidates are selected based on number of job openings available at a given 
time. An overview of this data base and analysis is available in Table 2. 

In order to investigate the classification for selection a total of four selection categories 
were formed. Score points analyzed were 125, 130, 135 and over. A score below 125 was 
classified '0\ five points below cut off were classified * T. a total score five points above cut off 
were classified 7* and all higher scoring candidates were classified as '3'. Random sampling 
of candidates was completed through the Ranuni sampling procedure on SAS computer package. 
Sample sizes and iterations for random selection were varied. For instance, iterations varied in 
frequency from 50 samples of 20, 25 samples of 40, 15 samples of 60, 10 samples of 100 and 10 
samples of 120 were computed. A total count of candidate selection per each cycle of random 
frequencies and iterations was obtained (sec Table 3). 



One difference in programming for classification for this data base is that an, 'or', 
statement was used for selection of candidates into categories. That is they must meet minimum 
requirements on Pan I . 'or' Part 1 1 in order to be classified in category 1 or 2. Therefore, a 
candidate might fall within five pwints below or above cut off on one test, but be classified as a 
'2'. Similarly, a candidate falling five points below cutoff would be classified as a T. 

In addition to classification of candidates by cut score, candidates were classified into 
25% 50% and 75% percentile ranges for rank ordering based on total score. A Fischer's exact 
statistical test was completed on a frequency chart of the top quartile candidates for the five 
sample size iterations. Results indicated non-significant findings for rank-order regardless of 
sample size. 
Classificatio n Scheme 

The classification scheme placed candidates within five categories a) a rating of '0' was 
assigned to those candidates falling more than five points below cutoff b) a rating of T 
indicated a total score on one of the two part test falling within five points below cutoff score, c) 
'2' indicated candidates that fell within five points above cutoff standards d) '3* indicated 
candidates over the five point range of minimum standards, and e) '4' classification indicated 
candidates falling above all other classifications. 
Rank-Ordered Ouartiles 

In order to assess the appropriateness of rank ordering individuals for selection, a 
subset sample of the 120 applicants was extrapolated (see Figure 1). Inspection of the frequency 
table indicated that 28% of the applicant pool were classified in the 3rd quartile, yet met all 
minimum entry requirements by test scores. If this is a usual occurrence with applicant pools 
when rank-ordering is used for selection, close to one-third of qualified applicants would be 



falsely rejected. The 'real world' impact of rejecting nearly 28% qualified personnel cannot be 
ignored. Costs involved for test time, and staffing of assessment centers are substantial. 
Redundant expenses for recruitment, file review, and evaluation of a candidate pool would 
occur. 

Sample Size and Frequency 

To assess this matter another way, the upper quartile sample for the candidates within 
the 120 sample size group were extrapolated from the others (see Table 3). A Fischer's exact 
test was computed on frequencies across sample sizes and reiterations, non-siguificance was 
indicated. Frequency in selection based on random sample size did not appear to be significantly 
different regardless of sample size. 
Two Criteria Bias 

Classification scheme for category 1 and 2 were also further analyzed. Table 4 depicts 
the individual candidates within this category for sample size 20, 50 iterations. Inspection of the 
total score values indicates candidates who may have more than sufficiently passed one test 
measure, but fell one or 

two points below cutoff on a second test. The typing test scores most frequently resulted within 
a lower ranking for merely one or two points. Given the nature of test anxiety and unfamiliarity 
with a given typewriter equipment, it would appear that another grouping of candidates may be 
erroneously classified in the non-selection category; thus driving the false positive rate higher. 
If one inspects category ' T section for candidate id #2894 (see Table 4) one can see a vivid 
example of this problem. The candidate has more than sufficient basic skills for mental 
processing of tasks, yet missed classification levels by simply three points on a typing test. The 
standard error of measurement may account for this discrepancy and reclassification of that 
candidate may be warranted. 
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Candidate Test Scores a nd Accommodations 

In order lo provide an in-depth look at the overall classification scheme for each 
category Table 5 is provided. Classification number and frequency of observations are included 
for each sample size and iterations. Similar classification of candidates was found across sample 
sizes. One candidate who did require accommodations (candidate U 2937) for test admission 
resulted in a total score of 84 and typing test of 49 classified as '2'. Whether or not the 
administration of a large print test differentiated a passing grade would be difficult to obtain, 
however, given the accommodation, the candidate did fall within five points above minimum 
standards. 

Cut Score Variation 

A final analysis was conducted by lowering cut scores for both the written test and 
typing test. A reclassification of class frequencies was computed on ten iterations of 120 sample 
size for the Data Two group, A plot of both cut score levels was computed (see Figure 2). The 
upper right quadrant signifies true positives. Each letter value has a numerical value relative to 
letter placement in the alphabet. Inspection of plots 'a' and 'b' clearly depict the differences in 
classifications for borderline applicants. 

For many standardized commercial personnel tests, five points can result in large 
discrepancies between true positives and false negatives. For appropriate personnel placement 
perhaps a range of acceptable values should be utilized to minimize false classifications. 
Obviously, the ultimate goal of any testing program would be to reduce false negatives and 
increase true positives. Perhaps utilizing a range of cut scores as indicated by the standard error 
of measurement for that test, would be helpful. In addition to maximizing human resource 
potential, this method of setting cut scores could save an individual company significant funds in 
reduced recruitment efforts and staff time to conduct candidate evaluations, 



Summary 

In summary two data samples were evaluated as a basis of classification schemes, 
quartile ranking and cut score differences. It was clearly demonstrated that 'overall' ranking 
candidates may result in substantial lose in cost and staff time given the 28% example of 
misclassified candidates found within this study. Although arbitrary sample sizes did not appear 
to effect classification of an individual candidate, cut scores variation did result in significant 
changes in classification. When cut score changes as small as five-points were made, 
reclassification of candidates did occur. 

Future investigations should explore differences in subtest scoring, measurement 
significance and certainly individual test standard error of measurement. Research literature 
indicates a high degree of litigation cases concerning the use of multiple regression techniques 
for personnel selection. In part, due to the vulnerability of litigation, many companies have 
turned to rank ordering employees and selecting a specific top ratio of candidates. However, 
employer vulnerability still exists in regard to methods used in setting cut scores and applicant 
classification. 
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hstimates of Candidatob Selection Based on Sample Size 
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Table 4 

Estimate of Frequency on Non-Sciection Based on Five Points Below Cut-off Score. 



Category 1 



ITERATIONS 


N 


ID 


FREQUENCY 


TOTAL TEST 


TYPING 


50 


20 


2856 


15 


74 


45* 


50 


20 


2917 


11 


77 


42 


50 


20 


2889 


8 


78 


43 


50 


20 


2903 


12 


76 


41 


50 


20 


2922 


11 


89 


43 


50 


20 


2885 


13 


84 


45* 


50 


20 


2924 


16 


84 


41 


50 


20 


2866 


12 


79 


45* 


50 


20 
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13 


96 


42 


50 


20 


2916 


8 


67 


19 


50 


20 


2851 


8 


92 


45* 


TOTAL 
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Category 2 
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N 


ID 


FREQUENCY 


TOTAL TEST 
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20 
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13 


83 


47 
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3 
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49 


50 


20 
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16 


80 


46 
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20 
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13 


87 


49 


50 


20 
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11 


71 


46 


50 


20 
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19 


83 


49 


50 


20 
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17 


80 


46 


50 


20 


2867 


14 


93 


46 


50 


20 


2931 


13 


91 


47 


50 


20 
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13 


91 


46 


50 


20 


2919 


12 


81 


47 


50 


20 
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13 


73 


37 


50 


20 


2932 


10 


71 


27 


50 


20 
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13 


86 


50 


50 


20 
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14 


74 


65 


50 


20 
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16 


84 


49 


50 


20 
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14 


71 


24 


50 


20 
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10 


74 
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7 


92 
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